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ABSTRACT 


The  structure  of  microprogrammed  processors,  and  microprogramming  in  general, 
is  largely  determined  by  two  facts:  the  state  of  (semiconduc.or)  technology  and  the 
task  of  emulation.  This  article  first  reviews  those  technological  advances  as  well  as 
trose  constraints  and  demands  imposed  by  the  emulation  process  that  have  shaped  the 
evolution  of  microprogramming. 

The  other  main  theme  of  this  article  is  that  it  is  a  fruitless  exercise  to  try  to 
characterize  and  understand  microprogramming  in  terms  of  how  it  differs  from  ’regular’ 
programmmg.  The  right  approach  to  understanding  microprogramming  is  to  recognize 
that  it  is  primarily  applied  to  the  task  of  emulation  (interpretation).  Through  this 
approach  the  evolution  of  microprogramming  independent  of  a  particular  technology 

and  type  of  instruction  set  being  emulated,  will  be  reviewed  and  future  trends 
indicated. 


1.  INTRODUCTION 


The  structure  of  microprogrammed  processors,  and  microprogramming  in  general, 
is  largely  determined  by  two  factors:  the  state  of  (semiconductor)  technology  and  the 
task  of  emulation.  Therefore,  this  article  first  reviews  those  technological  advances  as 
well  as  those  constraints  and  demands  imposed  by  thr  emulation  process  that  have 
shaped  the  evolution  of  microprogramming.  The  remainder  of  this  article  then  uses 
these  observations  to  put  the  past  developments  of  microprogramming  in  perspective 
and  forecast  the- major  deve.npments  in  the  years  ahead. 

The  other  main  theme  of  this  article  is  that  it  is  a  fruitless  exercise  to  try  to 
characterize  and  understand  microprogramming  in  terms  of  how  it  differs  from  ’regular 
programming.  The  futility  of  this  approach  can  be  seen  by  the  numerous 
contradictory  definitions  on  microprogramming  in  the  literature  [Rosin,  1969;  Wilkes 
1969;  Mallach,  1972].  Attempts  to  base  ?  definition  on  features  of  a  processor’s 
architecture,  such  as  horizontal  instruction  formats,  lack  of  an  explicit  program  counter, 
or  visibility  of  real  registers  and  data  paths;  or  features  of  a  processor’s  realization, 
such  as  the  speed  of  main  memory  to  that  of  the  control  (micro-)  memory,  are  easily 
rejected  on  the  basis  of  existing  processors  that  are  commonly  recognized  to  te 
microprogrammed  processors  yet  do  not  possess  the  required  features. 

Most  of  this  confusion  ir  alternative  definitions  of  microprogramming  comes  frc  m 
the  fact  that  it  has  been  used  in  two  very  different  ways:  (1)  in  a  technological  manner 
to  economically  implement  a  complex  instruction  set  or  a  small  number  of  different 
instruction  sets  on  a  single  processor,  and  (2)  in  a  software  manner  to  provide 
programmers  with  an  extra  degree  of  representational  freedom,  i.e.  develop  multiple 
instruction  sets,  each  one  appropriate  for  a  particular  task  domain.  The  technological 
use  of  microprogramming  was  the  dominant  justification  for  the  development  of 
microprogrammable  processors  in  the  1960’s.  But  as  the  cost  of  software  began  to 
become  the  major  cost  of  a  computer  system,  the  use  of  microprogramming  as  a 
technique  for  making  a  computer  more  convenient  to  program  has  and  will  continue  to 
become  the  more  important  application. 

The  right  approach  to  understanding  microprogramming  is  to  recognize  that  it  is 
primarily  applied  to  the  task  of  emulation  (interpretation).  Through  this  approach  it  is 
possible  to  understand  and  predict  the  evoluton  of  microprogramming  independent  of  a 
particular  technology  and  type  of  instruction  set  being  emulated. 

The  process  of  emulation  will  be  taken  up  in  considerably  more  depth  in  Section 
3,  but  it  will  be  useful  here  in  the  introduction  to  briefly  look  at  the  different 
processors  used  to  emulate  a  BASIC  machine.  On  the  one  hand  there  are  the  Hewlett- 
Packard  2100,  DEC  PDP-11,  and  PDP-8  that  have  time-sharing  systems  supporting 
BASIC.  The  only  language  available  to  the  user  is  BASIC  and  he  has  no  way  of 
knowing  the  architecture  of  the  processor.  On  the  other  hand  there  are  the  BASIC 
programmable  calculators  available  from  Hewlett-Packard  [Spagler,  1972]  and  Wang 
Laboratories  that  operate  as  BASIC  machines:  the  input  keys  and  the  displays  are 
tailor  jd  to  the  BASIC  language.  It  is  difficult  to  insist  that  the  HP-2100,  PDP-11,  and 
PDP-8  are  not  microprogrammed  processors  while  the  ’hidden’  processors  in  the  HP 
and  Wang  BASIC  calculators  are  microprogrammed.  The  only  characteristic  all  these 
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processors  have  in  common  s  that  they  are  emu' 'ting  BASIC  and  a  good  case  can  be 
made  for  dropping  the  term  microprogramming’  altogether  and  simply  use  ’emulation’ 
in  its  place.  However,  we  will  continue  to  use  the  term  ’microprogramming’  here  since 
it  is  so  widely  used  and  it  is  a  convenient  way  to  indicate  that  we  are  discussing 
programming  as  it  applies  to  emulation  (and  interpretation)  rather  than  programming  in 
general. 

Following  our  discussion  of  technology  and  emulation,  this  article  then  discusses 
specific  hardware  and  software  techniques  for  emulation.  A  number  c*  different  types 
of  microprogrammed  processors  are  also  included  as  examples. 

2.  SEMICONDUCTOR  TECHNOLOGY 

The  state  of  the  art  in  semiconductor  electronics  h. is  had  a  profound  effect  on 
the  feasibility  of  microprogramming.  Prior  to  the  1960’s  Ihe  only  effective  means  of 
implementing  a  high  speed  control  store  as  to  use  a  diode  matrix.  This  was  the 
technology  used  by  Whirlwind  I  [Everett,  1951]  and  by  W.lkes  in  his  original  paper  on 
microprogramming  [Wilkes  and  Stringer,  1953].  Figures  2.1  and  2.2  sho  v  the  structure 
of  these  control  units.  As  long  as  these  diodes  were  discrete  components  a  control 
store  of  any  reasonable  size  was  too  expensive  to  compete  with  alternate 
implementations  using  random  logic  (e.g.  about  3E,000  bits  of  control  storage  are 
required  to  implement  the  full  PDP-11/40  architecture  while  the  Whirlwind  I  had  only 
4,800  ’bits’  in  its  control  store).  It  is  important  to  realize  that  both  of  these  structures 
are  just  the  control  part  'f  the  processor  and  are  an  alternative  to  conventional 
sequential  control  circuits  as  shown  in  Figure  2.3.  It  was  not  until  the  middle  and  late 
1960’s  that  integrated-circuit  technology  advanced  to  the  point  that  economic  read¬ 
only-memories  (ROMs)  and  read-write  memories  (RAMs)  became  a  practical  reality.  It 
stands  to  the  credit  of  IBM’s  engineers  that  they  were  able  to  develop  the  IBM 
System/360  series  of  machines  via  microprogramming  in  the  early  1 960’s;  every  model 
in  the  early  IBM  360  line  used  a  different,  non-semiconductor  technique  to  implement 
its  control  store  These  ingenuous,  but  admittedly  cumbersome  and  costly  techniques 
could  be  laid  aside  when  the  IBM  370  series  of  machines  were  implemented  since 
integrated  circuit  technology  had  advarced  to  the  stage  that  semiconductor  control 
stores  were  reliable.  Figure  2.4  illustrate  the  basic  structure  of  curren* 
microprogram  control  units. 

Se  miconduc  or  memories  suitable  for  control  stores  in  microprogrammed 
processes  are  now  at  the  stage  where  256  bit/package  RAMs  ano  IK  (1024) 
bit/package  ROMs  are  in  wide  use  in  present  processors  and  IK  RAMs  and  4K  ROMs 
are  being  designed  into  the  newer  processors.  4K  RAMs  and  16K  HOMs  hive  been 
announced  and  are  available  in  limited  quantities,  but  in  general  they  are  too  slow  to 
be  seriously  considered  for  control  stores. 

For  well  over  10  years  now  semiconductor  manufacturers  have  set  a  pace 
where  the  commercially  feasible  chip  complexity  (i.e.,  number  of  devices  per  chip)  has 
roghly  doubled  every  one  to  two  years.  For  example,  the  4K  bit/package  RAM  (13,000 
devices)  was  introduced  roughly  two  and  one  half  years  after  the  IK  bit  (40 JO)  RAM. 
There  *s  every  reason  to  believe  that  this  trend  will  continue  fur  at  least  the  next  four 
to  six  years.  Hence  we  face  a  situation  where  we  can  expect  to  see  the  size  of 
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control  stores  grow  ng  as  technology  encourages  designers  to  use  more  control 
storage  to  cut  c  sts  in  other  areas,  improve  the  performance  of  the  microprocessors, 
Or  add  additional  capabilities. 

Memory  arrays  are  not  the  only  development  in  semiconductor  technology  that 
are  having  a  significant  effect  on  the  structure  of  microprogrammed  processors.  Two 
other  very  important  developments  are  the  programmable  logic  array  (PLA)  and 
shifters.  The  basic  structure  of  a  PLA  is  shown  in  Figure  2.5.  It  is  a  two-level 
combinatorial  logic  circuit  that  is  ’wired’  for  a  specific  application  by  the  masking,  or 
metalization,  that  is  used,  The  PLA  has  the  same  outward  characteristics  of  a  ROM 
except  that  it  would  take  a  ROM  with  several  orders  of  magnitude  more  devices  to 
match  the  function  of  the  PLA  in  many  applications.  For  example,  a  common  PLA  is  a 
Rockwell  Corporation  package  with  48  input/output  terminals  [Rockwell,  1973].  A  ROM 
that  would  be  equivalent  to  this  PLA  in  many  applications  would  require  two  orders  of 
magnitude  more  bits.  A  PLA  uses  the  same  techniques  that  designers  of  digital  circuits 
used  a  decade  ago  to  minimize  the  number  of  gates  required  to  realize  a  combinatorial 
function.  However,  if  tht  function  to  be  implemented  is  sufficiently  ill-conditioned  (e.g., 
a  parity  tester),  thj  PLA  offers  no  advantage  over  a  ROM.  Instruction  decoding  is  an 
example  of  a  combinatorial  function  amenable  to  minimization  techniques  and  hence 
PLAs  will  be  very  useful  for  providing  the  decoding  of  instructions  that  must  otherwise 
be  done  with  random  logic  or  via  a  sequence  of  microinstructions. 


PLAs  do  not  lend  thnmselves  to  dynamic  alternations;  there  is  no  natural 
addressing  mechanism  for  each  of  the  make-or-break  points  in  the  PLA  structure.  A 
dynamically  alterable  component  that  could  be  used  much  like  a  PLA  is  an  associative 
memory.  Associative  memories  have  been  toted  for  some  time  now  as  a  panacea  for 
many  problems  but  have  yet  to  prove  to  be  a  cost  effective  unit.  However,  as  the 
number  of  pins  per  package  becomes  more  of  a  limitation  than  the  complexity  of  the 
semnonducto  circuit  itself,  associative  memories  may  become  viable  components,  e.g. 
the  SPS-41  used  an  associative  memory  to  specify  sophisticated,  programmable  I/O 
patterns  that  will  cause  an  interrupt  [SPS.1972]. 

The  other  non-menory  semiconductor  device  that  has  recently  made  an 
important  impact  on  microprocessors  is  the  shifter.  For  example,  the  Signetics  8243 
takes  an  eight  bit  byte  as  input,  shifts  it  left  from  zero  to  seven  positions,  zeroing  out 
the  leftmost  bits,  and  presents  the  shifted  byte  on  eight  output  pins.  Using  a  package 
like  the  Signetics  8243  as  a  basic  building  block,  larger  shifters  can  easily  be 
constructed.  The  ability  of  cheaply  implementing  a  fast  shifter  makes  variable-length 
byte  extraction,  a  common  process  in  emulation,  a  much  easier  task. 

As  will  be  detailed  in  the  next  sections,  these  technology  advances  will  lead  to 
microprommable  architectures  that  are  more  uniform  in  structure  (less  ad  hoc),  easier 
to  program  and  can  more  efficiently  emulate  a  wide  variety  of  different  and  more 
complex  instruction  sets. 

3.  THE  PROCESS  OF  EMULATION 

As  we  stated  in  the  inroduction,  the  right  approach  to  understanding 
microprogramming  is  to  examine  ti,z  task  it  must  perform:  emulation.  Thus,  this 
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section  spells  out  in  detail  the  task  of  emulation  and  through  this  discussion  indicates 
the  appropriate  representational  framework  and  associated  operations  for  efficiently 
performing  an  emulation  (interpretation).  In  'he  next  section  we  tie  together  our 
observations  on  emulation  and  technology  to  predict  the  future  evolution  of 
microprogramming. 

Our  present  discussion  of  emulation  and  microprogramming  is  especially 
appropriate  given  the  view  a  major  trend  in  microprogramming  is  towards  more 
generalized  emulation  in  terms  of  both  the  number  and  complexity  of  machine 
languages  capable  of  being  efficiently  emulated  on  a  single  microprogrammable 
processor.  Recent  architectures  sjch  as  the  Burroughs  B1700  [Wilner,  1972],  which 
was  designed  for  efficient  emulation  of  algebraic  block-structure  languages,  and  SAAB 
FCPU  [Lawson  and  Malm,  1973],  which  provides  general  emulation  capabilities  in  a  high 
speed  processor,  are  examples  of  this  more  general  approach  to  emulation.  This  trend 
should  be  heightened  in  the  future  as  the  variety  and  complexity  of  tasks  being 
programmed  on  a  single  processor  continue  to  increase. 

An  interpreter  can  be  characterized  as  a  system  that  carries  out  the  execution 
of  a  program  in  one  representational  framework  by  dynamically  mapping  each 
statement  (instruction),  at  the  point  it  is  to  be  executed,  into  an  execution  sequence  of 
statements  in  another  environment  which  realize  the  semantics  of  the  mapped 
statement.  Given  this  definition  of  interpretation,  emulation  could  be  defined  as  the 
special  case  in  which  the  interpreter  maps  into  an  environment  which  is  directly 
executed  by  the  hardware.  However,  this  type  of  distinction  between  interpretation 
and  emulation  is  often  very  fuzzy.  For  example,  consider  the  interpretation  of  the  IBM 
7090  on  the  IBM  360/65  which  involves  the  use  of  two  environments  [Tucker,  1965], 
i.e.  360/65  microcode  and  360  machine  code  which  is  in  turn  emulated  in  the 
microcode. 

This  example  also  points  up  the  difference  between  actions  which  are  done 
solely  for  the  sake  of  interpretation  control  and  information  (mapping  actions)  and 
those  which  actually  cause  the  interpreted  program  to  be  executed  (execution  actions) 
[Mitchell,  1970].  In  this  example,  mapping  actions  were  programmed  in  a  different 
representation  environment  than  execution  actions,  respectively  360/65  microcode  and 
3b0  machine  language.  As  will  be  discussed  later,  the  appropriate  environments  for 
expressing  these  different  types  of  actions  and  the  interface  between  them  is  one  ot 
the  keys  to  understanding  the  evolution  of  microproni  ammable  processors  and  how  the 
emulation  task  differs  from  other  computational  tasks.  For  example,  the  SAAB  FCPU 
explicitly  recognizes  the  distinction  between  mapping  and  execution  actions  by 
providing  separate,  asynchronous  processing  elements  for  each  type  of  action. 

The  other  key  to  understanding  the  emulation  process  is  based  on  a  static  view 
of  this  process  in  contrast  to  the  dynamic  view  in  terms  of  mapping  and  execution 
action  so  far  presented.  A  static  view  of  emulation  comes  from  understanding  the 
relationship  between  the  two  environments  the  emulator  operates  on,  i.e.  the  emulated 
and  executioi  environment.  An  environment  consists  of:  (1)  a  data  and  control  state 
image  which  includes,  for  example  in  a  conventional  processor,  its  set  of  working 
registers  (accumulator,  index  register,  program  counter,  interrupt  register,  etc.)  and  its 
main  memory  which  hold  data  and  program;  (2)  a  set  of  primitive  actions  which  can  be 


used  to  modify  and  test  the  state  image;  and  (3)  a  set  of  control  rules  which  decide, 
iS*  °(n  6  current  status  of  the  control  state  image,  the  sequence  of  primitive 

actions  to  execute.  The  ease  with  which  each  of  these  aspects  of  an  environment  to 
interpreted  can  be  imbedded  into  the  corresponding  aspects  of  "execution" 
process"160  '*  ^  °f  ^  de,erminers  of  the  efficiency  of  the  interpretation 

both  S,ate  diaeram  of  one  s,eP  in  the  emulation  process,  Figure  3.1,  represents 
both  static  and  dynamic  aspects  of  the  emulation  process.  The  lefthand  side  of  the 
diagram  represents  the  effect  of  executing  an  instruction  of  the  emulated  computer  on 
the  state  image  of  the  emulated  computer.  The  righthand  side  represents  the 
ir?  0  ,ransforma,l0ns  ,hat  »he  microprogrammed  processor  must  perform  on  its 

lat  n  6  lfT1aS3  mu  6r  f°  emula,G  th'S  instrudion-  >"  ‘erms  of  this  diagram,  efficient 
emulation  occurs  when: 

1.  The  data  and  control  state  image  of  target  (emulated)  machine  can  be 
easily  imbedded  into  host  (microprogrammed  processor)  machine; 

2.  The  decoding  and  control  sequencing  function  can  be  implemented 
efficiently.  (In  conventional  instruction  sets  most  of  the  work 
involves  decoding,  but  in  the  emulation  of  higher-level  languages 
much  less  of  the  total  effort  is  spent  on  decoding.); 

3.  Microinstruction  semantics  can  operate  on  imbedded  state  image  of 

emulated  machine  in  the  same  way  the  emulated  instruction  does  on 
its  state  image. 

asnprtiVhJ!  initif'  uSf  of  m'croPro6ramrr|able  processors  for  emulation,  each  of  these 
envfrnn  ^  l  ^"tributes  f°  efficient  emulation  could  be  easily  attained  because  the 
environment(s)  to  be  emulated  was  known  before  the  design  of  the  processor.  This 

sTate  iTaee6^6/6^1^  ♦  "  th®  deS'gn  °f  3  microProerar™able  processor  that  had  a 

environ"  and  A J  °"  that  W6,e  comPa,lble  with  the  emulated 

be^wer^  d  ^  ^  vers,0n  of  the  maPP'ng  action  (control  and  decoding) 

bega^to  be  emulated  a  ^  ?  unanticiPa,ed  a"d  ^re  complex  environments 

gan  to  be  emulated  a  more  general  approach  was  needed: 

1.  a  generalized  decoding  structure; 

2.  a i  means  of  statically  reconfiguring,  for  the  duration  of  an  emulation, 
toe  state  image,  control  structure,  and  primitive  operation  of  the 
execution  environment  so  that  these  aspects  more  nearly  match 
those  of  the  emulated  environment  (see  Figure  3.2)  [Lesser,  1972]; 

3.  a  means  of  dynamically  modifying  the  microinstruction  semantics 
based  on  parameters  which  are  specified  in  the  emulated  .nstruction, 

microinstruction  as  a  parameterized  templates  [Lesser,  19711 
Another  way  of  viewing  this  requirement  is  the  need  for  clean 
efficient  interface  between  the  output  of  mapping  actions  and’ 
semantics  of  execution  actions. 
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These  requirements  for  generalized  emulation  together  with  the  technological 
advances  described  in  the  last  section,  have  led  to  the  following  concepts  being 
incorporated  into  more  advanced  microprogrammable  processors: 

1.  flexible  bit  extraction  and  manipulation  for  generalized  decoding: 

a.  barrel  shifter  and  mask  capability  (B1700  and  FCPU) 

b.  insertion  of  data  in  an  arbitrary  field  of  an  internal  register 
(FCPU) 

2.  the  concept  of  residual  control  as  a  way  of  configuring  the 
environment: 

a.  set  up  gating  patterns  between  registers  and  buses  (QM-1) 

b.  set  up  mode  of  arithmetic,  i.e.  l’s  complement,  BCD,  etc. 
(B1700,  FCPU) 

c.  set  up  word  length  of  data  which  will  be  applied  to  arithmetic 
operations,  memory  accesse  and  stores  (B1700,  FCPU) 

d.  pseudo-interrupt  register  for  embedding  control  structure  of 
emulated  machine  (MLP-900,  [Lawson  and  Smith  1971]) 

3.  microinstructions  as  parameterized  templates: 

a.  indirect  address  of  general  registers,  shift  count,  ALU  function 
(MLP-900) 

b.  execute-command  (B1 700, FCPU) 

This  list  of  features  when  taken  as  a  whole  shed  some  light  on  what  are  the 
appropriate  components  of  an  environment  (microprogrammed  processor  architecture) 
for  general  purpose  emulation: 

1.  a  primitive  unit  of  information  which  is  the  bit  string. 

2.  a  capability  for  dynamically  reconfiguring  both  the  internal  and 
external  environment  of  a  microprogrammable  processor,  i.e.  word 
width,  number  of  general  registers,  control  structures,  register 
bussing  connections,  arithmetic  mode,  etc. 

3.  a  capability  for  constructing  complex  address  mapping  functions. 

These  are  capabilities  that  are  desirable  in  almost  all  types  of  computer  environment. 
The  important  point  is  that  they  are  crucial  for  effective  emulation,  i.e.  these  features 
should  be  looked  at  in  terms  of  a  matter  of  degree  rather  than  specific  function  when 
comparing  with  other  task  domains. 


— — -  -  -- «■  — - - 
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The  future  of  microprogrammable  processors  will  inevitably  result  in  a  more 
generalized  version  of  these  concepts  as  technology  permits.  However,  the  aspect  of 
microcomputer  architectures  that  will  probably  receive  the  most  attention  in  the  next 
10  years  is  their  control  structure.  The  control  structure  will  play  a  more  important 
role  in  future  years  because  one  of  the  dominant  trends  in  programming  languages  is 
towards  more  complex  control  structure  (i.e.  coroutine,  data  flow  models,  parallelism, 
etc.).  Inevitably,  these  more  complex  control  structures  in  future  programming 
languages  will  be  reflected  in  the  machine  languages  that  will  be  compiled  into. 

4.  HARDWARE  AND  SOFTWARE  INTERPRETATION  TECHNIQUES 

To  predict  the  future  of  microprogramming  it  is  necessary  to  understand  how 
hardware  and  software  techniques  are  used  in  effecting  interpretation.  Then, 
advances  in  technology  can  be  related  to  advances  in  techniques  and,  hence,  to 
resultant  advances  in  computer  systems.  Since  microprogramming  is  simply  a  variation 
of  conventional  programming  in  terms  of  the  desire  for  generality  and  ease  of  coding, 
advances  in  microprogramming  will  likely  follow  the  same  pattern  already  seen  in 
assembly  level  programming  over  the  last  twenty  five  years.  This  is  especially  true 
given  the  trend  toward  more  complex  and  varied  instruction  sets  which  will  require 
writing  of  many  large  emulators,  each  supporting  a  complex  run  time  environment,  e.g. 
PL/1  machine,  operating  systems  machine,  etc.  Since  emulation  is  the  major  application 
of  microprogramming,  specific  programming  support  will  be  accented.  With  advances  in 
technology  offering  more  storage  capacity  and  functional  processing  per  unit  area  (at 
low  cost),  hardware  structures  will  become  more  flexible  thus  providing  a  general 
environment  for  interpretation  and  emulation.  Since  sections  of  general  structures 
usually  go  unused  in  any  single  application,  the  cost  or  cost-performance  of  generality 
is  rarely  acceptable  to  all.  However,  the  added  cost  of  generality  may  be  borne  by 
improved  technology  thus  providing  the  user  with  more  functional  capability  at  a 
constant  cost.  In  contradistinction,  the  consumer  market  for  computers  requires  the 
lowest  possible  cost  and,  so,  will  trade  generality  for  cost.  Here,  technology  is  used  to 
lower  cost  while  keeping  the  application  specific. 

In  addition  to  the  techniques  detailed  in  the  last  section  for  general  purpose 
emulation,  there  are  also  techniques  for  making  it  easy  to  microprogram  many  large 
emulators.  A  list  of  techniques,  in  approximate  Order  of  increasing  generality,  include: 

Moie.  high-speed  ataiKins  registers.  Efforts  to  minimize  the  size  of 
the  processor  state  is  not  as  strong  in  microprogrammed  processors 
as  it  is  in  more  conventional  processors. 

*■  Larger  Lontrol  stores.  Much  of  the  current  involuted  character  of 
microprograms  is  a  result  of  squeezing  a  complete  emulator  into  a 
small  space  (e.g.  256  words)  and  more  reasonable 

(micro)programming  will  be  possible  with  larger  control  stores. 

3.  Bhox  branches  tease  statements).  The  ability  to  test  several 
conditions  and  branch  to  any  of  several  sections  of  code  which 
service  them. 


8 


4-  (Micro)subroutines.  The  ability  to  invoke  a  function  or  reference 
data  specified  indirectly  at  a  higher  level,. 

Memory  management.  Multiprogramming  is  already  a  common 
practice.  For  example,  emulators  for  central  processors,  several  I/O 
processors,  and  microdiagnostics  often  reside  in  the  same  control 
store.  Problems  of  protection,  relocation,  and  using  overlays  or 
paging  from  backing  stores  are  issues  of  emerging  concern  in 
microprogramming. 

6-  (Mi.Cro)iliterrupt?.  Useful  when  multiple  emulations  are  being  run  on 
the  same  processors. 

The  hardware  components  which  initially  supported  microprogramming  were 
adequate  speed  ROMs  and  multiplexors.  ROMs  provide  taDles  to  encodu,  decode,  and 
sequence  control.  Multiplexors  extract  fields,  assemble  conditions  for  testing  in 
parallel,  and  select  control  information  from  registers  containing  the  higher  level 
instructions  (indirect  control)  rather  than  from  the  microcode  (direct  control).  The  next 
advance  came  with  the  availability  of  high  speed,  random  access,  alterable  memory. 
With  these,  microprograms  are  easily  corrected,  extended,  or  swapped  for  those  which 
provide  different  functions,  for  example,  machine  diagnosis  (microdiagnostics).  More 
recent  advances  in  technology  have  made  available  low  cost,  small  sized  shifters, 
associative  memories,  PLAs,  and  decimal  arithmetic  units.  The  fast  shifter  is  the  most 
important  of  these  since  it  easily  extracts  fields  from  instructions  being  interpreted  or 
data  from  special  formats,  such  as  floating  point  numbers. 

To  understand  the  implication  of  hardware  and  software  techniques  it  is 
necessary  to  consider  their  application.  The  next  section  provides  detailed  extimp:°.. 
At  this  point  the  uses  of  microprogramming  can  be  decomposed  into  two  dimensic  is. 
The  first  compares  designs  by  the  level  of  language  supported.  The  range  includis 
assembly,  intermediate,  and  high  level  languages.  The  second  dimension  orders 
machines  by  the  number  of  environments  supported,  typically  subdivided  in  two 
classes,  one  and  many.  Over  the  list  decade  the  number  of  environments  has 
increased  and  their  level  has  risen  frou  the  assembly  toward  the  procedure  oriented. 
In  the  past  when  several  environments  were  provided,  one  at  a  time  was  selectable 
from  a  small,  fixed  set. 

By  observing  the  development  of  assembly  level  programming  techniques  and  by 
observing  the  parallel  development  of  microprogramming  so  far,  a  reasonable 
predict  on  would  be  the  continuation  of  the  trend.  If  so,  thr  next  step  will  be  the 
generalization  and  sharing  of  resources  at  the  microprogram  level.  First,  relocation 
and  protection  schemes  for  alterable  microstores  will  be  developed.  Then,  memory 
management  and  demand  paging  schemes  to  effect  the  ability  to  run  large 
microprograms  in  comparatively  little  physical  space  will  be  included.  The  dynamic 
allocation  of  microstore  address  space  will  probably  require  a  micro-operating  system 
with  fewer  tasks  than  conventional  ones  but  many  similarities  with  respect  to  space 
allocation  techniques.  To  facilitate  writing  and  checkout  of  so  much  code,  high  level 
languages  designed  for  microprogramming  will  be  developed,  just  as  they  are  now 
being  used  more  and  more  as  a  tool  for  developing  system  programs  today. 
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To  support  these  advances  in  microprogramming  software,  ha'dware  must  be 
provided.  The  most  important  advance  on  present  components  s  larger  microstores 
made  possible  by  faster  and  denser  memories  As  an  alternative  to  a  fast,  large 
microstore  the  cache  structure  could  be  used  to  combine  a  small,  very  fast  primary 
microstore  with  a  larger,  slower  secondary  one.  Similarly,  demand  paging  requires  a 
f a st  swapoing  medium.  This  might  be  provided  by  a  high  speed,  low  capacity  solid 
state  disk  with  low  latency. 

Given  the  ability  to  execute  so  much  microcode  what  use  might  be  found  for  it? 
Extrapolating  from  today’s  machines  and  keeping  the  needs  of  emulation  in  mind,  one 
natural  application  would  be  to  provide  multiple  programming  environments.  By  this  is 
meant  a  time-shared  computer  system  whose  users  divide  into  classes  each  requiring 
the  same  environment.  Some  of  these  would  be  machine  languages  for  older  machines, 
others  would  be  intermediate,  high  level  (Fortran,  PL/I,  COBOL),  or  upolication  oriented. 
The  high  speed  shifter  is  useful  in  all  of  these  to  extract  fields.  Emulating  earlier 
machines  would  be  made  easier  by  the  use  of  a  programmable  PLA  or  associative 
memory  (to  replace  logic  not  conveniently  embedded  in  memories  due  to  the  large 
number  of  inputs).  Finally,  note  that  the  provision  of  multiple  environments  is  a 
problem  in  multiprogramming  and,  eventually,  as  more  environments  are  desired,  in 
time -sharing. 

5.  MACHINE  SPECIES 

The  various  microprogrammed  processors  can  be  characterized  along 
evolutionary  lines,  which  in  turn  roughly  correspond  to  their  implementation 
complexity.  One  of  ti  e  earliest  computer  imp'ementations,  Whirlwind  I  [Everett,  1951], 
formulated  the  control  part  as  an  encoding  in  a  changeable,  diode  array  memory  (see 
Figure  2.1).  Frorri  this  Wilkes  and  Stringer  extended  the  encoding,  and  coined  the  word 
"microprogra  timing"  [Wilkes  and  Stringer,  1953]. 

5.1  OnazMadune-,  IntesLated  Conical  and  Dala  Eiri 

With  the  availability  of  fast,  read  only,  random-access  memories  computer 
processors  with  a  single,  fixed  instruction-set  were  designed.  These  early  designs 
permitted  instruction-sets  wih  more  complex  data-operations  (e.g.  multiply,  divide, 
double  precision).  The  most  notable  design  of  this  type,  the  IBM  System/360  [Blaauw 
and  Brooks,  1964;  Stevens,  1964]  was  ac.ually  a  set  of  about  10  computer  models 
implementing  the  same  instruction  set  cover  ng  a  performance  range  of  about  300  and 
a  price  range  of  about  100.  Over  half  of  the  models  were  implemented  using 
programmed  control  interpreters. 

A  Eixgii  Gim  oi  Conyenlibnal  InstrudioncSets 

Given  that  a  single  machine  instruction  set  can  be  implemented  in  a  single 
processor,  the  natural  extension  is  to  implement  several  machines.  The  earliest 
implementations  of  multiple  instruction  sets  in  a  single  physical  machine  used 
conventional  programming.  First  generation,  cyclic  access,  drum  memory  computers 
were  "emulated"  using  higher  speed,  second  and  third  generation  computers  with 
random  access  memories. 
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An  early  and  exlr  nsive  use  of  multiple,  fixed  machine  emulations  occurred  wi  h 
the  IBM  360  microprogrammed  processors  as  they  were  used  to  implement  the  IBM 
System/360  instruction-set,  the  360  input-output  processor  instruction-sets,  and 
several  models  of  earlier  IBM  computers.  The  design  methodology  of  these  computers 
is  not  well  understood  outside  IBM.  The  design  process  for  these  machines  appears  to 
be:  first  the  primary  machine  (in  this  case  the  360)  is  designed;  the  various  other 
machines  to  be  interpreted  are  then  added  to  the  design  by  installing  their 
idiosyncrasies  (e.g.  carry  and  overflow  conditions,  state,  special  data  path  breaks) 
[Tucker,  1965]. 

&  Variable  Group  Qi  Conventional  lnstructlQn~Seti 

Given  that  a  single  machine  can  be  bi  't  that  implements  several  conventional 
instruction  sets  (sequentially),  can  a  machine  nat  implements  several  instruction  sets, 
but  on  a  variable  basis,  be  built?  In  effect,  Standaid  Computer  Corporation  attempted 
such  a  design  in  the  IC-model  4  and  later  the  MLP  900  [SOC  1968;  SCC,  1969]  The 
main  goal  of  the  MLP-900  was  to  implement  an  IBM  360,  together  with  other  undefined 
machines,  e.g.  PDP-10,  etc.  In  essence,  the  machine  was  designed  with  much  generality 
using  multiple  register  sets,  and  a  two-stage  pipeline  for  instruction  fetching  and 
instruction  execution.  The  variable  parts,  which  cannot  be  emulated  easily  by 
sequencing,  were  brought  to  a  4  position,  multiple  pole,  electronic  switch,  which 
permitted  up  to  4  variable  parts  to  be  selected  by  direct  wiring  on  a  plugboard  array. 
Although  such  an  approach  is  of  academic  interest,  the  mechanical  aspects  of  the 
plugboarding  preclude  the  machine  from  being  interesting  in  a  production  or  economic 
sense.  The  myriad  of  details  associated  with  the  input-output  section  (e.g.  channels, 
device  state  words,  and  transitions)  add  to  the  system  definition  job  more  than  the 
central  processor. 

Currently,  there  are  no  commercially  viable  machines  that  emulate  a  set  of  other 
conventional  type  machines  on  a  variable  basis.  It  appears  that  the  machines  to  be 
emulated  must  be  determined  a  priori,  in  a  fixed  fashion.  Such  a  machine  would  permit 
any  one  machine  to  be  emulated  at  a  given  instant  by  loading  a  memory  with  the 
information  necessary  to  interpret  the  target  machine.  Although  this  has  been  done 
when  a  large  machine  interprets  another  machine,  the  implication  in  such  a  task  is  that 
the  speed  of  emulation  is  essentially  that  of  the  target  machine.  It  appears  the 
necessary  hardware  for  this  task  will  be  available  in  th?  near  future  and  that  such 
systems  can  exist  by  1980. 

5.4.  A  Single  Higher  Level  Language  Interpreter  Machine. 

Since  the  use  of  higher  level  algebraic  languages  (e.g.  Algol,  Fortran)  and  more 
natural  textual  languages  (e.g.  Cobol)  there  has  been  a  substantial  interest  in  the 
development  of  hardware  that  would  interpret  the  languages  directly.  To  date,  several 
machines  have  been  built  for  single  languages  (using  directly  hardwired  techniques), 
and  a  number  of  machines  have  been  microprogrammed  to  interpret  languages  directly. 
These  designs  have  not  resulted  in  any  particular  insight  about  direct  language 
interpretation.  The  implementations  execute  the  object  target  language  faster  than  the 
non-microprogrammed  counterparts,  and  the  speed  improvements  hold  no  surprises; 
the  faster  memory  of  the  microcode,  together  with  the  small,  registe'  transfer 
primitives,  provide  the  improvement. 
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5.5  Interpreting  Many  Languages  D'.ectiv  with  a  Single  Machine 

tJ°  only  the  Burroughs’  B1700  [Wilner,  1972]  has  been  bui't  with  the  goal 
Of  either  the  direct  interpretation  or  compiL*  and  execution  of  several  higher  .ove, 
anouages.  In  that  it  is  able  to  interpret  the  various  languages,  and  encode  tne  object 
code  in  a  space  of  roughly  one  half  that  of  a  conventional  small  computer  /the  IBM 
S/f ,  *e™  3),.‘  IS  Successful'  However,  .he  execution  t  me  is  not  clear;  one  would  expect 
.  tw0  ,nCrease  in  the  execution  of  the  object  code,  too.  There  has  been  no 
attempt  to  compare  the  execution  time  on  a  tec*  rology-normalized  basis.  The  B1700 

£  3  40,  aenH  R  m  !,h?  interpretation  of  several  conventional  machines  (e.g. 

140 i  and  Burroughs  B2uOO).  Considering  all  factors,  the  B1700  apoears  to  be 
the  most  general*  of  the  microprogrammed  machines  in  existence. 

5.6  Special  Purpose  Machines 

...  An  fspecia"y  interesting  evolution  of  microprogrammed  machines  has  occurred 
a/'°n  °f  afray  data  f°  fTlatrix  and  vec,or  °Pera,'Ons,  including  time 

fahnratn  e'8'’  ^  f°Uner  ,ransfor"1ati°">-  Although  there  were  several  early 

laboratory  processors,  IBM’s  2938  performs  this  function.  Most  recently  a  3 

processor  system  for  these  operations  has  been  developed  and  is  attached  as  a 
peripheral  to  a  conventional  minicomputer  [SPS,  1972],  The  three  processors  are 

Z  1°7  Hy  f  Para',!d  ,0r:  ,e,chi"e  da,a  ,f0m  the  allachad  ‘■""puler.  collecting  analog 
nputs,  and  storing  the  results  back;  moving  data  from  the  local  array  in  the  right  order 
for  the  arithmetic  part;  ar.d  the  arithmetic  part. 

6.  CONCLUSIONS  AND  FORECASTS 

<lirrecl"  !h,S  ar,ide  we  have  reviewed  the  most  important  constraints  within  which 
hU:Ct?cr.  miCr?P;0grared  process0rs  must  operate:  semiconductor  technology  and 
dirJtinn  /  rmU'a  0n:  'S  °Ur  Vi6W  that  ,hese  constraints  more  strongly  influence  the 
direct, on  future  of  microprogramming  per  se.  In  fact,  as  we  stated  in  the  introduction, 

r  3  8  Ca$e  dr0ppm8  »he  term  microprogramming  altogether  and  simply 

Z  olr  l  Pr0CeuS!°rf  are  des'8ned  efficiently  emulate  the  instruction  set  of 
target  machine  architectures. 

nrnuiHTtle  mai0r,  .irTlpact  °f  sem|conductor  technology  on  microprogramming  is  to 

arrays  andTa.T  h  ft  ?  St°raSe  H°WeVer>  ,he  emer^  of  programmable  logic 

arrays  and  fast  shifters  is  also  bound  to  have  a  significant  effect  on  microprocessors 

have  mleI ",  P°,nt  e*ad‘y  What  e,,ect  the  Processor-on-a-ch,p  will 

much  o  thpT  hft  e?  L?  pr0cessor  on  a  sin8le  semiconductor  chip  eliminates 
but  on  h  '  available  ,n  constructing  processors  from  MSI/LSI  components, 

,  e  oth°r  hand,  provides  a  complete  processor  at  a  very  low  cost  If 

processors-on-a-chif)  become  sufficiently  popular,  emulation  will  come  into  heavy' use 
as  these  primitive  processors  are  surrounded  by  emulation  routines  to  transform  them 
into  processor;  with  which  we  are  comfortable  working. 


microcode  in 
BCD  formats. 


*As  measured  by  ability  to  access  any  bit  in  memory,  to  have  arbitrary  length 
ode  in  any  memory,  and  to  operate  on  variable  length  field  with  both  binary  and 


Our  review  of  the  requirements  of  the  emulation  task  pointed  to  a  number  of 
central  concepts  that  are  required  for  efficient  emulation. 

Tabie  6.1  summarizes  the  major  dimensions  of  emulation  for  different  levels  of 
target  machines.  In  each  cell  the  importance  of  each  subtask  is  ‘ndicated  and  new 
concepts  or  capabilities,  not  used  by  a  subtask  at  the  previous  level,  are  notod. 
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\’cx  t  s  tote  mod  i  f  i  crs 


Microinstruction  word  layout 


NEXT  STATE 


MOD’FY 
NEXT  STATE 


OUTPUTS 


*  Microprogram  address  register. 
**  One  possible  Implementation. 


I nputs 


FtguAe.  2.4  CuAAcr.t  ntic'icp'icgtm  contiot  utiiX-6 


These  internal  lines  correspond  to  minterms 
in  a  sum-of-products  representation  of  the 
function. 


Figure  2.5.  A  programmable  logic  array  (PLAj 


Table  6.1 


Emulation  subtasks  for  each  of  the  major  machine 
(language)  levels. 


Level 

Sequencing 

Enstruction 

fetch 

Instruction 

Decoding 

Operand 

Accessing 

Data 

Operations 

Machine 

Language 

conditional 
branch  sub¬ 
routines 

^ixed 

format 

immediate 

indirect 

indexed 

add,  multiply, 
or,  comple¬ 
ment 

medium 

high 

High 

medium 

low 

Basic 

iteration 

simple 

syntax 

subscripted 

sine,  cosine, 
matrix  ops. 

medium 

medium 

low 

medium 

medium 

Fortran, 

pl/i, 

Algol, etc. 

block  struc 

ture  recur¬ 
sion,  co¬ 
routines 

non-rectangu- 
lar  data 

structures 

high 

low 

low 

medium 

medium 

Lisp, 

Snobol, 

etc. 

linked  lists 

high 

low 

low 

_ h.*fih 

high 
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